Skip to content

SyntheticControl: in-space placebo permutation inference + reporting-stack integration#511

Merged
igerber merged 3 commits into
mainfrom
feature/synthetic-control-placebo
May 31, 2026
Merged

SyntheticControl: in-space placebo permutation inference + reporting-stack integration#511
igerber merged 3 commits into
mainfrom
feature/synthetic-control-placebo

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 31, 2026

Summary

  • Implements ADH 2010 §2.4 in-space placebo permutation inference for the classic SyntheticControl estimator (PR-1 / Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core #501 shipped the estimator with no analytical SE) and wires SyntheticControlResults into the practitioner / DiagnosticReport / BusinessReport reporting stack.
  • SyntheticControlResults.in_space_placebo(): reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the other J−1 donors (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matches SCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE ratio among the J+1 units.
  • New fields: placebo_p_value (= rank/(n_placebos+1); an upper-tail rank test on the unsigned RMSPE-ratio statistic — direction-agnostic, detects an effect of either sign; ties via ), rmspe_ratio (treated statistic, set at fit), n_placebos/n_failed (effective reference-set sizes; non-converged placebos excluded from BOTH numerator and denominator). placebo_p_value is a separate field from the always-NaN analytical p_value — no SE/t-stat, does not flow through safe_inference; is_significant stays bound to p_value.
  • Fail-closed convergence contract: a valid fit requires BOTH inner Frank-Wolfe AND outer-V convergence of the selected incumbent; the treated fit fails closed and placebos are excluded on either failure. The Powell polish validates the incumbent only when it converges back AT the incumbent's objective level (np.isclose), never on a success at a strictly-worse point.
  • Edge cases fail closed: scale-aware RMSPE-ratio floor (perfect pre-fit → finite ratio, not inf), J<2 → NaN+warn, J==2 → degenerate+coarse warn, deterministic given seed. get_placebo_df() returns the per-unit RMSPE-ratio table (incl. treated row + failed donors). Placebo compute is opt-in; each fit retains a pickle-excluded _SyntheticControlFitSnapshot of the pivoted panel.
  • Reporting integration: SCM routed like TROP (parallel-trends-free, fit-based) — new scm_fit PT analogue (design_enforced_pt verdict reading pre_rmspe), a _scm_native block surfacing pre_rmspe + donor-weight concentration + the placebo p-value when already computed (never triggering the refit implicitly), a practitioner _handle_synthetic_control with the placebo as the headline significance step, and a BusinessReport fit-based assumption block with ADH 2010 attribution. Also fixes a latent BR bug where headline is_significant was a non-JSON-serializable numpy bool_ when p_value is numpy NaN.

Methodology references (required if estimator / math changes)

  • Method name(s): Classic Synthetic Control in-space placebo permutation inference (RMSPE-ratio rank test).
  • Paper / source link(s): Abadie, Diamond & Hainmueller (2010), JASA 105(490) §2.4 / §3.4; donor-pool construction cross-checked against SCtools::generate.placebos (CRAN). Documented in docs/methodology/REGISTRY.md §SyntheticControl.
  • Any intentional deviations from the source (and why): (1) The placebo donor pool excludes the real treated unit (J→J−1) — documented as a **Note:** (its post-period is treatment-contaminated; matches SCtools). (2) The reported rmspe_ratio is on the root scale; ADH §3.4 reports the MSPE ratio (the square) — monotone-equivalent, so rank/p-value are identical (documented). (3) placebo_p_value is kept separate from the analytical p_value/is_significant (no analytical SE exists for classic SCM) — documented as the non-analytical-p-value split. (4) Scale-aware RMSPE-ratio floor + non-converged-placebo exclusion from numerator AND denominator — documented **Note:** labels. In-time placebo / leave-one-out (ADH 2015) are out of scope and tracked in TODO.md.

Validation

  • Tests added/updated: tests/test_methodology_synthetic_control.py (~19 placebo + convergence tests: infeasible donor counts, treated-fit non-convergence, Powell-succeeds-at-worse-point regression, failed-placebo exclusion, deterministic reruns, pickle behavior, custom-V path, get_placebo_df shape/status); tests/test_diagnostic_report.py + tests/test_business_report.py (SCM-native routing, scm_fit prose, target-parameter, robustness, infeasible/precomputed-rejection); tests/test_practitioner.py (placebo step dispatch). Full local run: 428 passed, 1 skipped (pure-Python) across the four affected files.
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial in this PR (ADH-2015 diagnostics + a tutorial are tracked follow-ups).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

…stack integration

Implements ADH 2010 §2.4 in-space placebo inference for the classic
SyntheticControl estimator (PR-1 / #501 shipped the estimator with no
analytical SE), and wires SyntheticControlResults into the practitioner /
DiagnosticReport / BusinessReport reporting stack.

Methodology (diff_diff/synthetic_control.py, synthetic_control_results.py):
- SyntheticControlResults.in_space_placebo(): reassigns treatment to each
  donor, refits a synthetic control for that pseudo-treated donor against the
  other J-1 donors (the real treated unit is excluded from every placebo pool
  -- its post-period is treatment-contaminated; matches
  SCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE
  ratio among the J+1 units.
- New fields placebo_p_value (= rank/(n_placebos+1); an upper-tail rank test on
  the unsigned RMSPE-ratio statistic, ties via >=), rmspe_ratio (treated
  statistic, set at fit), n_placebos/n_failed (effective reference-set sizes;
  non-converged placebos excluded from BOTH numerator and denominator).
- placebo_p_value is a SEPARATE field from the always-NaN analytical p_value;
  it carries no SE/t-stat and does not flow through safe_inference;
  is_significant stays bound to p_value.
- Fail-closed contract: a valid fit requires BOTH inner Frank-Wolfe AND outer-V
  convergence of the SELECTED incumbent; treated fit fails closed and placebos
  are excluded on either failure. The Powell polish validates the incumbent
  only when it converges back AT the incumbent's objective level (np.isclose),
  never on a success at a strictly-worse point.
- Edge cases: scale-aware RMSPE-ratio floor (perfect pre-fit -> finite ratio),
  J<2 -> NaN+warn, J==2 -> degenerate+coarse warn, deterministic given seed.
- get_placebo_df() returns the per-unit RMSPE-ratio table (incl. treated row +
  failed donors). Placebo compute is opt-in; each fit retains a
  _SyntheticControlFitSnapshot of the pivoted panel (excluded from pickling).

Reporting (diagnostic_report.py, practitioner.py, business_report.py,
_reporting_helpers.py): SCM routed like TROP (parallel-trends-free, fit-based).
New scm_fit PT analogue (design_enforced_pt verdict reading pre_rmspe), a
_scm_native diagnostic block surfacing pre_rmspe + donor-weight concentration +
the placebo p-value when already computed (never triggering the refit
implicitly), a practitioner _handle_synthetic_control with the placebo as the
headline significance step, and a BusinessReport fit-based assumption block with
ADH 2010 attribution. Also fixes a latent BR bug where headline is_significant
was a non-JSON-serializable numpy bool_ when p_value is numpy NaN.

Docs: REGISTRY.md §SyntheticControl (new **Note:** labels for donor-pool
construction, failure handling, RMSPE-ratio floor, non-analytical-p-value
split), REPORTING.md, docs/api/synthetic_control.rst, LLM guides, README,
CHANGELOG. TODO.md tracks the ADH-2015 follow-up (in-time placebo / LOO) and a
compact/lazy snapshot representation.

Tests: ~19 placebo + convergence tests (infeasible donor counts, treated-fit
non-convergence, Powell-succeeds-at-worse-point, failed-placebo exclusion,
deterministic reruns, pickle behavior) + SCM reporting-routing tests across all
three tools.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings. Against the local ADH 2010 registry/docstrings, the new SCM placebo path is source-faithful; the remaining issues are non-blocking follow-ups.

Executive Summary

  • The methodology-sensitive SCM changes match the documented ADH 2010 contract in the registry; the notable deviations are explicitly labeled and therefore not defects.
  • P2: SyntheticControlResults.summary() misstates why placebo inference was infeasible when the treated fit itself failed to converge.
  • P2: in_space_placebo(n_starts=...) does not validate the override and can silently coerce invalid values into a one-start placebo run.
  • P3 tracked: the always-on _SyntheticControlFitSnapshot memory cost is already captured in TODO.md.
  • I could not execute the test suite in this sandbox because pytest and scientific dependencies are unavailable here.

Methodology

  • P3 informational The donor-pool exclusion, root-scale RMSPE ratio, failed-placebo exclusion from the rank denominator, and keeping placebo_p_value separate from analytical p_value/headline reporting are all explicitly documented in docs/methodology/REGISTRY.md:L1997-L2000. Impact: none; these are documented notes/deviations, not defects. Concrete fix: none.

Code Quality

  • P2 SyntheticControlResults.summary() infers every infeasible placebo run from n_placebos/n_failed alone, so the treated-fit fail-closed branch is later narrated as “too few donors or all donor refits failed,” which is false for that path. See diff_diff/synthetic_control_results.py:L324-L331 and the treated-fit fail-closed branch at diff_diff/synthetic_control_results.py:L576-L592. Impact: misleading diagnosis for the exact non-convergence case this PR introduces. Concrete fix: persist an explicit placebo status/reason and render that in summary() instead of reconstructing the reason from counts.
  • P2 The new in_space_placebo(n_starts=...) override skips the positive-integer validation that the estimator constructor already enforces. diff_diff/synthetic_control_results.py:L550-L553 accepts 0/negative values via int(...), while diff_diff/synthetic_control.py:L175-L176 rejects them at estimator construction. Impact: a bad override silently changes the permutation procedure instead of failing fast. Concrete fix: mirror the constructor validation and add a regression for invalid overrides.

Performance

  • No findings.

Maintainability

  • No findings beyond the P2 state/reporting issue above.

Tech Debt

  • P3 informational Every SCM fit now retains a full _SyntheticControlFitSnapshot; this memory cost is already tracked in TODO.md:L166. Impact: higher per-fit memory even when in_space_placebo() is never called. Concrete fix: none required for approval; the existing TODO is sufficient.

Security

  • No findings.

Documentation/Tests

  • No unmitigated doc/test blocker. The added tests cover the main SCM placebo/reporting paths, but the two P2 cases above do not appear to have dedicated regressions. Impact: those edge cases can regress quietly. Concrete fix: add one test for summary() after treated-fit non-convergence and one validation test for invalid n_starts overrides.

…ty reason + n_starts validation

CI review (PR #511) flagged two P2s:

1. summary() misdiagnosed the infeasible-placebo reason. The treated-fit
   fail-closed branch and the J<2 branch both leave n_placebos=0, n_failed=0,
   so reconstructing the cause from counts narrated a non-converged treated fit
   as "too few donors or all donor refits failed" (false for that path). Add an
   explicit `_placebo_status` recorded by in_space_placebo()
   ({treated_fit_nonconverged, too_few_donors, all_placebos_failed, ran}) and
   render the specific reason in both summary() and DiagnosticReport._scm_native
   instead of inferring it from counts. The status is a small string, so it
   survives pickling alongside _placebo_df / the scalar fields.

2. in_space_placebo(n_starts=...) skipped the positive-integer validation the
   estimator constructor enforces — int(0)/int(-1)/int(2.5) silently coerced
   into a degenerate or invalid permutation procedure. Mirror the constructor's
   check and fail fast with a ValueError.

Also relabels the lingering "One-sided rank" code comment to "upper-tail rank
on the unsigned RMSPE ratio" (matching the REGISTRY/CHANGELOG wording fix).

New regressions: summary() names the treated-fit-failure reason (not the
donor-side reason) under identical n_placebos/n_failed counts; invalid n_starts
overrides raise and leave placebo state untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1d188638a204737796ab1d21a378da393d3841c9


Overall Assessment

✅ Looks good — this re-review resolves the prior P2s, and I did not find any unmitigated P0/P1 issues in the changed SCM estimator, placebo, or reporting paths.

Executive Summary

  • The methodology-sensitive SCM changes are consistent with the updated registry and the in-code ADH 2010/SCtools contract.
  • The two prior re-review items are fixed: summary() now keys off _placebo_status, and in_space_placebo(n_starts=...) now fails fast on invalid overrides in diff_diff/synthetic_control_results.py:336 and diff_diff/synthetic_control_results.py:583.
  • The donor-pool exclusion, root-scale RMSPE ratio, effective-denominator handling for failed placebos, and separate placebo_p_value surface are all explicitly documented in docs/methodology/REGISTRY.md:1987 and docs/methodology/REGISTRY.md:1997.
  • P3: the always-on _SyntheticControlFitSnapshot memory cost remains, but it is already tracked in TODO.md:166, so it is non-blocking under the review rubric.
  • P3: the new SCM diagnostic-report fixture installs a process-wide warnings ignore filter in tests/test_diagnostic_report.py:160, which can weaken warning-regression coverage.
  • I could not execute the changed tests here because pytest is not installed in this sandbox.

Methodology

  • P3 informational The methodology-sensitive SCM behavior matches the documented contract: no analytical SE, opt-in in-space placebo inference, treated-unit exclusion from placebo donor pools, root-scale RMSPE ratio, and failed-placebo exclusion from the effective rank denominator are all explicitly documented notes/deviations in docs/methodology/REGISTRY.md:1987 and docs/methodology/REGISTRY.md:1997. Impact: none. Concrete fix: none.

Code Quality

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 informational The PR keeps a full _SyntheticControlFitSnapshot on every SCM fit to support opt-in placebo refits, which increases per-fit memory even when placebo inference is never used; this limitation is already tracked in TODO.md:166. Impact: higher steady-state memory for SCM results. Concrete fix: none required for approval; follow the tracked compact/lazy snapshot follow-up.

Security

  • No findings.

Documentation/Tests

  • P3 The new scm_fit fixture in tests/test_diagnostic_report.py:160 calls warnings.filterwarnings("ignore"), which mutates global warning filters rather than scoping suppression to the fixture setup. Impact: later tests in the same worker can silently stop observing warning regressions. Concrete fix: wrap only the synthetic_control(...) call in warnings.catch_warnings() plus warnings.simplefilter("ignore"), or use a targeted pytest.mark.filterwarnings on the affected tests.
  • Static-review note: I could not run the changed test targets here because pytest is unavailable in this environment.

…lobal filter mutation

CI review (PR #511) P3: the new scm_fit fixture called
warnings.filterwarnings("ignore"), which permanently mutates the process-wide
warning filters and can silently weaken warning-regression coverage in later
tests in the same worker. Wrap only the synthetic_control(...) call in
warnings.catch_warnings() + simplefilter("ignore") so the suppression is scoped
to the fixture's solver call. (Pre-existing sibling fixtures in this file use
the same legacy global-filter pattern but are out of scope for this PR.)

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 44ea0b409ca9bd96b33e503edfc5bdb7a39b1e50


Overall Assessment

✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed SCM placebo-inference or reporting-stack paths.

Executive Summary

  • This PR changes classic SyntheticControl inference/reporting behavior, specifically the in-space placebo permutation path and its DiagnosticReport / BusinessReport / practitioner routing.
  • I did not find any undocumented methodology mismatches, missing assumption checks, or incorrect SE/inference handling in the changed SCM code.
  • The prior re-review issues are resolved: summary() now keys off _placebo_status, invalid in_space_placebo(n_starts=...) overrides fail fast, and the scm_fit fixture warning suppression is now scoped in diff_diff/synthetic_control_results.py:L332-L361, diff_diff/synthetic_control_results.py:L580-L588, and tests/test_diagnostic_report.py:L159-L168.
  • The remaining always-on _SyntheticControlFitSnapshot memory cost and the deferred extra SCM diagnostics are explicitly tracked in TODO.md:L166 and TODO.md:L88, so they are non-blocking under the review rubric.
  • I could not run the targeted tests here because pytest is not installed in this environment.

Methodology

  • Severity: P3 informational. The SCM placebo implementation is consistent with the source-paper inference framing: analytical inference remains unavailable, and significance is assessed by comparing the treated unit’s post/pre RMSPE ratio against the placebo distribution. The changed behaviors I checked here — separate non-analytical placebo_p_value, treated-fit fail-closed handling, root-scale rmspe_ratio, and effective-denominator exclusion of failed placebos — are explicitly documented in docs/methodology/REGISTRY.md:L1987-L2000 and implemented in diff_diff/synthetic_control.py:L393-L545 and diff_diff/synthetic_control_results.py:L507-L729. Impact: none. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 informational. The always-on SCM fit snapshot remains a memory tradeoff, but it is now explicitly tracked in TODO.md:L166. Impact: higher per-fit memory even when in_space_placebo() is never called. Concrete fix: none required for approval; follow the tracked compact/lazy snapshot item.
  • Severity: P3 informational. The remaining non-landed SCM diagnostics are explicitly deferred in TODO.md:L88. Impact: the reporting stack currently surfaces pre_rmspe plus in-space placebo only. Concrete fix: none required for approval; handle in the tracked follow-up.

Security

  • No findings.

Documentation/Tests

  • No findings on the changed docs/tests. The prior warning-filter regression is fixed by the scoped suppression in tests/test_diagnostic_report.py:L159-L168. Impact: none. Concrete fix: none.
  • Static-review note: I could not execute tests/test_methodology_synthetic_control.py, tests/test_diagnostic_report.py, tests/test_business_report.py, or tests/test_practitioner.py because pytest is not installed here.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
@igerber igerber merged commit 1d2ec02 into main May 31, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/synthetic-control-placebo branch May 31, 2026 21:08
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant